macro-action policy
Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information Genevieve Flaspohler 1,2, Nicholas Roy 1, and John W. Fisher III 1 Massachusetts Intitute of Technology 1
Finally, Section D provides additional visualizations and discussion of experimental results. The following section contains the detailed derivation of Lemmas 5.3-5.5 presented in the main text. S22 and S25 follow by the triangle inequality, Eq. We can bound the final term in Eq. Plugging this expression in to Eq. S23, we have the recursion: Despite it's continuous nature, the value function for any discrete, finite horizon POMDP can be This property can be observed directly from Eq. 2 when integration is replaced by summation.
Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
Technology:
Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.31)
Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.53)